refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility. by gibber9809 · Pull Request #2229 · y-scope/clp

gibber9809 · 2026-04-28T21:21:55Z

Description

This PR reworks how we parse arguments for output destinations during search in order to make it easy to support having multiple output destinations in a follow-up PR.

Each output destination is treated like a subcommand, e.g. <output-destination-type> <output-destination-arguments>. Previously, when supporting only one output destination, we would simply collect the unrecognized options after parsing the main search options and hand those options to an options_description specific to the relevant output destination. With multiple output destinations though we need to do a little bit of extra work to figure out which unrecognized options are related to each subcommand.

This PR solves that problem by parsing the unrecognized options left to right, using both the known output destination names and the options_description object for each subcommand to attempt to find the options for each output destination. After this step the options can be parsed with each destination-specific options_description like normal. We can't simply iteratively use each options_description with allow_unregistered to parse the full list of options, because different output destinations share options with the same name (e.g. --host is used both by reducer and network), so some combinations of output destinations could result in throwing errors because of repeated options. As well, we can't simply rely on splitting up the options for each subcommand based solely on the ouptut destination names, because the output destination names can also appear as arguments (e.g. in the package, because of the way our DNS is set up, a reducer query will look like clp-s s <archive> <query> reducer --host reducer --port <port> --job-id <id> --count.

This PR also refactors the CommandLineArguments class to group together options for each output destination into structs.

This PR includes a small functional change, in that aggregation options are now part of the reducer output destination options. This changes how aggregation options must be specified on the command line (i.e. they explicitly need to be part of the reducer output destination arguments), but doesn't represent any change in functionality since aggregation options are only allowed for the reducer output destination anyway.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Confirmed that package search through webui and cli work as expected
Manually validated that options are split as expected when passing multiple output destinations (before the command is rejected for having multiple output destinations)

Summary by CodeRabbit

Bug Fixes
- Stricter validation and error handling for output handler arguments, including treating stdout as no-options handler.
- Search now rejects unsupported reducer usage in certain output scenarios.
Refactor
- Command-line argument handling restructured to provide per-output-handler option groups and typed option objects.
- Aggregation flags moved to reducer-specific handler scope and repositioned in constructed reducer commands.

…multiple output handlers.

…dler-options

…parate options related to a given subcommand.

coderabbitai · 2026-04-28T21:22:07Z

Walkthrough

Output handler configuration moved from many individual getters to typed option structs; CLI parsing now groups trailing tokens into per-handler subcommands and requires exactly one handler. Aggregation flags were relocated into reducer-specific options and validation; runtime code and command assembly updated to use the new option model.

Changes

Output handler & CLI parsing

Layer / File(s)	Summary
Option Types / API `components/core/src/clp_s/CommandLineArguments.hpp`	Adds `AggregationType` enum and four output-handler option structs; replaces many scalar getters with `get_*_output_handler_options()` getters; updates helper signatures to take `std::vector<std::string> const&` and adjusts includes.
CLI Parsing / Subcommand Splitting `components/core/src/clp_s/CommandLineArguments.cpp`	Parses unrecognized trailing tokens into per-output-handler “subcommand” argument groups; requires exactly one handler; implements `parse_*_output_handler_options` per handler; moves `--count`/`--count-by-time` into reducer option parsing and validates reducer-specific constraints; treats `stdout` as handler with no options.
Application Wiring `components/core/src/clp_s/clp-s.cpp`	Creates output handlers and configures reducer aggregation by consuming the new option structs; switches aggregation branching to `options.aggregation_type`; localizes option variables in case blocks.
Search Aggregation Check `components/core/src/clp_s/kv_ir_search.cpp`	Changes condition to return `CountSupportNotImplemented` when selected output handler is `Reducer` rather than based on former aggregation-flag methods.
Command Assembly for Jobs `components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`	Appends aggregation flags (`--count`, `--count-by-time` and bucket size) after reducer argument block when building the reducer command.
Tests/Docs/Support `components/core/src/clp_s/*`	Parsing helpers, validation logic, and related doc comments updated alongside the new structured option model.

Build Configuration

Layer / File(s)	Summary
CMake sources `components/core/src/clp_s/CMakeLists.txt`	Removes `../clp/cli_utils.cpp` and `../clp/cli_utils.hpp` from `CLP_S_CLP_SOURCES` list (no other CMake changes).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (argv)
    participant Parser as CommandLineArguments
    participant App as clp-s
    participant Search as kv_ir_search
    participant Job as fs_search_task.py
    CLI->>Parser: provide argv (including trailing handler tokens)
    Parser->>Parser: split trailing tokens into handler subcommand args
    Parser-->>App: return selected OutputHandlerOptions (exactly one)
    App->>Search: initiate search with selected handler type
    Search-->>App: responds with aggregation support check (uses handler type)
    App->>Job: when dispatching reducer jobs, pass reducer args then append aggregation flags

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: refactoring command-line argument parsing for output destinations to improve extensibility, which is the core objective throughout all modified files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.12)

components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

Unexpected Ruff issue shape at index 6

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/core/src/clp_s/CommandLineArguments.cpp`:
- Around line 184-211: The loop currently defaults
get_max_tokens_for_option(...).value_or(0) to 0 which lets an option with
unknown arity cause the next token (which might match a subcommand name) to be
misinterpreted as a subcommand; change the logic in the branch that sets
remaining_tokens_in_current_argument so that when
get_max_tokens_for_option(current_subcommand_options_description, option)
returns std::nullopt you handle it explicitly (e.g., throw
std::invalid_argument("unrecognized option \"...\"") or otherwise treat it as an
error) instead of silently defaulting to 0; update code around
remaining_tokens_in_current_argument, current_subcommand,
current_subcommand_arguments and collect_subcommand to rely on that explicit
handling.
- Around line 993-995: The thrown std::invalid_argument in
CommandLineArguments.cpp currently says "clp-s only supports one output handler
at a time"; change this to include the actual detected handlers from
output_options_map so users see which handlers were passed. Locate the check
using output_options_map (the one that throws when output_options_map.size() >
1) and build a descriptive message by joining the map keys / handler names
(e.g., iterate output_options_map and collect the keys into a comma-separated
string) and include that string in the exception text so the thrown message
shows the list of detected handlers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c84e61f7-19bf-4f9e-90b9-3b0851f0bfab

📥 Commits

Reviewing files that changed from the base of the PR and between aa28a0c and 359ab95.

📒 Files selected for processing (6)

components/core/src/clp_s/CMakeLists.txt
components/core/src/clp_s/CommandLineArguments.cpp
components/core/src/clp_s/CommandLineArguments.hpp
components/core/src/clp_s/clp-s.cpp
components/core/src/clp_s/kv_ir_search.cpp
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

💤 Files with no reviewable changes (1)

components/core/src/clp_s/CMakeLists.txt

coderabbitai · 2026-04-28T21:26:37Z

+            if (output_options_map.size() > 1) {
+                throw std::invalid_argument("clp-s only supports one output handler at a time");
            }


🧹 Nitpick | 🔵 Trivial

Consider clarifying the single-handler constraint.

The error message could be more helpful by listing which handlers were detected. This would aid debugging when users accidentally specify multiple handlers.

💡 Optional enhancement for better error diagnostics

if (output_options_map.size() > 1) { - throw std::invalid_argument("clp-s only supports one output handler at a time"); + std::string handlers; + for (auto const& [name, _] : output_options_map) { + if (false == handlers.empty()) { + handlers += ", "; + } + handlers += name; + } + throw std::invalid_argument(fmt::format( + "clp-s only supports one output handler at a time, but found: {}", + handlers + )); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@components/core/src/clp_s/CommandLineArguments.cpp` around lines 993 - 995, The thrown std::invalid_argument in CommandLineArguments.cpp currently says "clp-s only supports one output handler at a time"; change this to include the actual detected handlers from output_options_map so users see which handlers were passed. Locate the check using output_options_map (the one that throws when output_options_map.size() > 1) and build a descriptive message by joining the map keys / handler names (e.g., iterate output_options_map and collect the keys into a comma-separated string) and include that string in the exception text so the thrown message shows the list of detected handlers.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py`:
- Around line 154-155: The code currently appends "--count" when
aggregation_config.do_count_aggregation is not None, which incorrectly includes
False; change the guard to only append the flag for a truthy value (e.g., use if
aggregation_config.do_count_aggregation: or if
aggregation_config.do_count_aggregation is True:) so that --count is only added
when do_count_aggregation is true; update the check around the
command.append("--count") in the code that constructs the command (refer to
aggregation_config.do_count_aggregation and the command list where
command.append("--count") is called).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d58fee1a-8c86-4133-bfe9-aded68e6283d

📥 Commits

Reviewing files that changed from the base of the PR and between 359ab95 and feb23c7.

📒 Files selected for processing (2)

components/core/src/clp_s/CMakeLists.txt
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py

💤 Files with no reviewable changes (1)

components/core/src/clp_s/CMakeLists.txt

coderabbitai · 2026-05-11T14:33:24Z

+        if aggregation_config.do_count_aggregation is not None:
+            command.append("--count")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Find aggregation config model definition ==" rg -n -C3 'do_count_aggregation|count_by_time_bucket_size|class .*Aggregation' --type=py echo echo "== Find assignments/usages that may pass False ==" rg -n -C2 'do_count_aggregation\s*=' --type=py echo echo "== Find command construction sites for --count ==" rg -n -C3 '"--count"|\'\''--count'\''' --type=py

Repository: y-scope/clp

Length of output: 11193

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Check for explicit False assignments to do_count_aggregation ==" rg -n 'do_count_aggregation\s*[:=]\s*False' --type=py echo echo "== Check instantiation patterns for AggregationConfig ==" rg -n -A2 'AggregationConfig(' --type=py | head -100 echo echo "== Check the full context of where --count flag is constructed ==" rg -n -B5 -A1 'append.*--count' --type=py

Repository: y-scope/clp

Length of output: 122

The --count flag guard should check truthiness, not non-None.

Line 154 currently adds --count whenever do_count_aggregation is not None, which includes when it is explicitly False. Since the field is typed as bool | None, False is a valid distinct state, and the flag should only be appended for truthy values.

Proposed fix

- if aggregation_config.do_count_aggregation is not None: + if aggregation_config.do_count_aggregation: command.append("--count")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if aggregation_config.do_count_aggregation is not None:

command.append("--count")

if aggregation_config.do_count_aggregation:

command.append("--count")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/job-orchestration/job_orchestration/executor/query/fs_search_task.py` around lines 154 - 155, The code currently appends "--count" when aggregation_config.do_count_aggregation is not None, which incorrectly includes False; change the guard to only append the flag for a truthy value (e.g., use if aggregation_config.do_count_aggregation: or if aggregation_config.do_count_aggregation is True:) so that --count is only added when do_count_aggregation is true; update the check around the command.append("--count") in the code that constructs the command (refer to aggregation_config.do_count_aggregation and the command list where command.append("--count") is called).

LinZhihao-723

I don't have comments on the parsing logic. I think how we store the parsed options is the most important decision to make in this PR, since it affects how C-APIs will be written.
One way is to do the variant thing I mentioned in another comment.
The other way is we expose different entry points for search, each takes a particular config option type and initializes the output handler accordingly.
I think for now, it makes more sense to use the variant solution as it keeps the C++ version cleaner: one config variant variable instead of one variant enum + four config objects stored inside the master config.

LinZhihao-723 · 2026-05-15T23:02:18Z

From the API design perspective (to make it future-proof), I think it might make sense if we:

Make the output handler config a variant of the four different types.

The master config only contains one variant variable, and it must be one of the four.

Replace the switch statement with a variant visitor.

LinZhihao-723 · 2026-05-15T23:03:04Z

Didn't review the coding-style thing... the functionality looks good to me.

gibber9809 added 5 commits April 24, 2026 17:32

Clean up parameter grouping for various output handlers.

a0f1a91

Refactor output handler options parsing to make it easier to support …

530dd7e

…multiple output handlers.

Merge remote-tracking branch 'upstream/main' into refactor-output-han…

4ed2cf5

…dler-options

Group aggregation options with reducer output handler options.

0984f72

Use the options_description for each subcommand to more accurately se…

359ab95

…parate options related to a given subcommand.

gibber9809 requested a review from a team as a code owner April 28, 2026 21:21

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

gibber9809 requested a review from LinZhihao-723 May 3, 2026 21:53

Merge branch 'main' into refactor-output-handler-options

feb23c7

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

junhaoliao force-pushed the main branch from 4505f8c to 8977316 Compare May 13, 2026 22:48

LinZhihao-723 requested changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229

refactor(clp-s)!: Rework command line argument parsing for output destinations to improve extensibility.#2229
gibber9809 wants to merge 6 commits into
y-scope:mainfrom
gibber9809:refactor-output-handler-options

gibber9809 commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 28, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 11, 2026

Uh oh!

LinZhihao-723 left a comment

Uh oh!

LinZhihao-723 May 15, 2026

Uh oh!

LinZhihao-723 May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if aggregation_config.do_count_aggregation is not None:
		command.append("--count")

Conversation

gibber9809 commented Apr 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 left a comment

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

LinZhihao-723 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gibber9809 commented Apr 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading